Recap of first week with Netvizz


In [3]:
import os
import pandas as pd

On first week of the social media scraping tooltrack we used Netvizz.

It has six different modules

  • group data
  • page data
  • page like network
  • page timeline images
  • search
  • link stats

We received a couple of files for each page. For example:


In [4]:
ethospagedatadir = "page_126517697403099_2017_10_31_14_48_28"

In [5]:
os.listdir(ethospagedatadir)


Out[5]:
['page_126517697403099_2017_10_31_14_48_28.gdf',
 'page_126517697403099_2017_10_31_14_48_28_comments.tab',
 'page_126517697403099_2017_10_31_14_48_28_fanspercountry.tab',
 'page_126517697403099_2017_10_31_14_48_28_fullstats.tab',
 'page_126517697403099_2017_10_31_14_48_28_statsperday.tab']

And we explored the shape and nature of the data, to see what did we receive from Netvizz.

The .gdf graph file data has nodes


In [67]:
ethosg = {'nodes': pd.read_csv(ethospagedatadir + '/' + 'page_126517697403099_2017_10_31_14_48_28.gdf', nrows=416),
          'edges': pd.read_csv(ethospagedatadir + '/' + 'page_126517697403099_2017_10_31_14_48_28.gdf', skiprows=417)
         }

In [68]:
ethosg['nodes'].sample(3)


Out[68]:
nodedef>name VARCHAR label VARCHAR type VARCHAR type_post VARCHAR like_count INT comment_count INT reactions_count INT engagement INT post_published VARCHAR post_published_unix INT shares INT post_id VARCHAR post_link VARCHAR
5ed2d7b34596e1ec2e0c30fa8860d45a25c756d0 aee5080c2c2736359328464ba278dff74c7d020e user user 1 0 1 1 NaN NaN NaN NaN NaN NaN
f852e1ea223f213e72099eb4a49438ddfa3c97b0 3d8e9d302eeddd7589a2a73b8e2dfaecea7bf1b4 user user 1 0 1 1 NaN NaN NaN NaN NaN NaN
b03f84ceb50905e559a365aee6da8c65710cbc9e Everyone can learn to code! 🙂 post video 27 0 34 34 2017-08-23T07:35:09+0000 1.503474e+09 0.0 126517697403099_1395602890494567 https://www.facebook.com/bbcnews/videos/101549... NaN

and edges in the same file


In [63]:
ethosg['edges'].sample(3)


Out[63]:
edgedef>node1 VARCHAR node2 VARCHAR weight INT directed BOOLEAN
182 a3ff0663c212ec05885d267b961344254f3971d3 46af478416f60768587acd0983852716432fcbbb 1 True
494 12553163662409cf01a21e56a6cb81dcbd8f2dae b03f84ceb50905e559a365aee6da8c65710cbc9e 1 True
105 bdc3dff1593b66a12e6893689071d1005f737c2c 3462f3f8ba01e66f207415130235586cdcbace59 1 True

And the data in the fullstats data frame looks like this.


In [ ]:
ethosstats = pd.read_csv(ethospagedatadir + '/' + 'page_126517697403099_2017_10_31_14_48_28_fullstats.tab', sep='\t')

In [65]:
ethosstats.sample(3)


Out[65]:
type by post_id post_link post_message picture full_picture link link_domain post_published ... comments_replies comment_likes_count rea_NONE rea_LIKE rea_LOVE rea_WOW rea_HAHA rea_SAD rea_ANGRY rea_THANKFUL
23 event post_page_126517697403099 126517697403099_1424530640935125 https://www.facebook.com/126517697403099/posts... Wednesday is time for It Match Making our bi-... https://scontent.xx.fbcdn.net/v/t1.0-0/c106.0.... https://scontent.xx.fbcdn.net/v/t1.0-9/c254.0.... https://www.facebook.com/events/1916608298617207/ facebook.com 2017-10-01T18:00:00+0000 ... 0 0 0 3 0 0 0 0 0 0
19 video post_page_126517697403099 126517697403099_1551983308178905 https://www.facebook.com/126517697403099/posts... Some people mistakenly think that you need pro... https://scontent.xx.fbcdn.net/v/t15.0-10/s130x... https://scontent.xx.fbcdn.net/v/t15.0-10/s720x... https://www.facebook.com/ITUniversityCopenhage... facebook.com 2017-10-08T18:00:00+0000 ... 0 0 0 18 0 1 0 0 0 0
38 status post_page_126517697403099 126517697403099_1396588387062684 https://www.facebook.com/126517697403099/posts... Are you a (new) student at ITU? Then check out... NaN NaN NaN NaN 2017-08-25T05:30:00+0000 ... 0 0 0 4 0 0 0 0 0 0

3 rows × 29 columns

And the data in the comments data frame looks like this.


In [ ]:
ethoscomments = pd.read_csv(ethospagedatadir + '/' + 'page_126517697403099_2017_10_31_14_48_28_comments.tab', sep='\t')

In [66]:
ethoscomments.sample(3)


Out[66]:
position post_id post_by post_text post_published comment_id comment_by is_reply comment_message comment_published comment_like_count attachment_type attachment_url
8 22_0 126517697403099_1427070814014441 2cdea889f2ab0c23d54fd66088e7dae5272c065b Congratulations to Jørgen Staunstrup who has ... 2017-10-02T09:54:01+0000 1427070814014441_1427132457341610 f023973a19879cb9b7c553a3d6329d6d0cfcee94 0 He is also an excellent teacher! 2017-10-02T11:46:19+0000 0 NaN NaN
10 26_0 126517697403099_1417639771624212 2cdea889f2ab0c23d54fd66088e7dae5272c065b Great news! As part of a new collaboration wit... 2017-09-20T10:53:53+0000 1417639771624212_1418312011556988 93a6a8cde1db6fbca1d4f6afa07d2978118b0955 0 What about Copenhagen Suborbitals? 😊 2017-09-21T05:55:18+0000 0 NaN NaN
12 36_1 126517697403099_1400408306680692 2cdea889f2ab0c23d54fd66088e7dae5272c065b These students will become pioneers of the fi... 2017-08-29T08:54:44+0000 1400408306680692_1400556329999223 8489c1b9706767146f1dcb32ef72e9f8599a43ce 0 Martin B. Villadsen fotogen much? 2017-08-29T13:27:58+0000 1 NaN NaN

Netvizz is described in Rieder's paper Studying Facebook via Data Extraction: The Netvizz Application (WebSci'13).

We can use Tableau, Gephi or other tools to merge and/or manipulate the data, and Table 2 Net to define other graphs.

The point was to be able to bring these possibilities to our project groups, and raise some questions.